965 research outputs found
4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem
As an entry for the 2012 Gordon-Bell performance prize, we report performance
results of astrophysical N-body simulations of one trillion particles performed
on the full system of K computer. This is the first gravitational trillion-body
simulation in the world. We describe the scientific motivation, the numerical
algorithm, the parallelization strategy, and the performance analysis. Unlike
many previous Gordon-Bell prize winners that used the tree algorithm for
astrophysical N-body simulations, we used the hybrid TreePM method, for similar
level of accuracy in which the short-range force is calculated by the tree
algorithm, and the long-range force is solved by the particle-mesh algorithm.
We developed a highly-tuned gravity kernel for short-range forces, and a novel
communication algorithm for long-range forces. The average performance on 24576
and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49%
and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012
(http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional
information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201
N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions
We present a high-performance N-body code for self-gravitating collisional
systems accelerated with the aid of a new SIMD instruction set extension of the
x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the
Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600
processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture,
we implemented a fourth-order Hermite scheme with individual timestep scheme
(Makino and Aarseth, 1992), and achieved the performance of 20 giga floating
point number operations per second (GFLOPS) for double-precision accuracy,
which is two times and five times higher than that of the previously developed
code implemented with the SSE instructions (Nitadori et al., 2006b), and that
of a code implemented without any explicit use of SIMD instructions with the
same processor core, respectively. We have parallelized the code by using
so-called NINJA scheme (Nitadori et al., 2006a), and achieved 90 GFLOPS for a
system containing more than N = 8192 particles with 8 MPI processes on four
cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating
collisional system with N 105 on massively parallel systems with at most 800
cores with Sandy Bridge micro-architecture. This performance will be comparable
to that of Graphic Processing Unit (GPU) cluster systems, such as the one with
about 200 Tesla C1070 GPUs (Spurzem et al., 2010). This paper offers an
alternative to collisional N-body simulations with GRAPEs and GPUs.Comment: 14 pages, 9 figures, 3 tables, accepted for publication in New
Astronomy. The code is publicly available at
http://code.google.com/p/phantom-grape
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
6th and 8th Order Hermite Integrator for N-body Simulations
We present sixth- and eighth-order Hermite integrators for astrophysical
-body simulations, which use the derivatives of accelerations up to second
order ({\it snap}) and third order ({\it crackle}). These schemes do not
require previous values for the corrector, and require only one previous value
to construct the predictor. Thus, they are fairly easy to implemente. The
additional cost of the calculation of the higher order derivatives is not very
high. Even for the eighth-order scheme, the number of floating-point operations
for force calculation is only about two times larger than that for traditional
fourth-order Hermite scheme. The sixth order scheme is better than the
traditional fourth order scheme for most cases. When the required accuracy is
very high, the eighth-order one is the best. These high-order schemes have
several practical advantages. For example, they allow a larger number of
particles to be integrated in parallel than the fourth-order scheme does,
resulting in higher execution efficiency in both general-purpose parallel
computers and GRAPE systems.Comment: 21 pages, 6 figures, New Astronomy accepte
- …